Multilevel Origin-Destination Direct Ridership Model

Unconditional Model

$$Y_{ijk} = \theta_0 + b_{00j} + c_{00k} + d_{0jk} + e_{ijk}$$

where

$Y_{ijk}$ is the log transformed ridership level for origin-destination pair $i$, with orign $j$ and destination $k$

$\theta_0$ is the grand mean ridership level for all OD pairs

$b_{00j}$ is the random main effect of origin j (the contribution of origin $j$ averaged across all destinations)

$c_{00k}$ is the random main effect of destination k (the contribution of destination $k$ averaged across all origins)

$d_{0jk}$ is the random interaction effect of $i$ and $j$

$e_{ijk}$ is the random OD effect (i.e. random error associated with each OD pair)

Conditional Model

The fully specified (combined) cross-classified model is given by Raudenbush and Bryk (2002, p. 384), as:

$$Y_{ijk} = \theta + \theta a_{ijk} +\beta_0X_k +\gamma_0W_j +\delta_0 X_k*W_j+\beta_1X_k*a_{ijk} + \gamma_1W_j*a_{ijk} + \delta X_k*W_j*a_{ijk} + b_{01j}X_k + c_{01k}W_j + b_{10j}a_{ijk} +c_{10k}a_{ijk} + d_{1jk}a_{ijk} + b_{11j}X_k*a_{ijk} + c_{11k}W_j*a_{ijk} + b_{00j} + c_{00k} + d_{0jk} + e_{ijk}$$
In [ ]:

first load some utility/plotting packages for mixed models

In [28]:
lib <- c("lattice", "foreign", "nlme", "lme4", "car", "descr", "effects", "xtable", "texreg", "lmtest", "Hmisc", "ggplot2", "coefplot2", "MASS")
require(lib)
lapply(lib, require, character.only=T)
Loading required package: lib
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘lib’Loading required package: MASS
Out[28]:
  1. TRUE
  2. TRUE
  3. TRUE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  11. TRUE
  12. TRUE
  13. TRUE
  14. TRUE

include a nice convenience function for plotting random effects courtesy of caracal from this post

In [2]:
## re = object of class ranef.mer
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE) {
  require(ggplot2)
  f <- function(x) {
    pv   <- attr(x, "postVar")
    cols <- 1:(dim(pv)[1])
    se   <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
    ord  <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
    pDf  <- data.frame(y=unlist(x)[ord],
                       ci=1.96*se[ord],
                       nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
                       ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
                       ind=gl(ncol(x), nrow(x), labels=names(x)))
    
    if(QQ) {  ## normal QQ-plot
      p <- ggplot(pDf, aes(nQQ, y))
      p <- p + facet_wrap(~ ind, scales="free")
      p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
    } else {  ## caterpillar dotplot
      p <- ggplot(pDf, aes(ID, y)) + coord_flip()
      if(likeDotplot) {  ## imitate dotplot() -> same scales for random effects
        p <- p + facet_wrap(~ ind)
      } else {           ## different scales for random effects
        p <- p + facet_grid(ind ~ ., scales="free_y")
      }
      p <- p + xlab("Levels") + ylab("Random effects")
    }
    
    p <- p + theme(legend.position="none")
    p <- p + geom_hline(yintercept=0)
    p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
    p <- p + geom_point(aes(size=1.2), colour="blue") 
    return(p)
  }
  
  lapply(re, f)
}

Read the latest data file from WMATA and remove OD pairs with no ridership

In [3]:
wmata <- read.dta('Data/Collapsed_Example_0616_Hiro1.dta')
wmata <- subset(wmata, ridersum >1)
In [4]:
histogram(log(wmata$ridersum))

Fit some null models:

  1. an intercept-only model
    $Y = \theta + e$
  1. a model with intercepts that randomly vary by origin station
    $Y_{ij} = \theta + b_{00j} + e_{ij}$
  1. a model with intercepts that randomly vary by destination station
    $Y_{ik} = \theta + c_{00k} + e_{ik}$
  1. a model with 2 random intercepts (origins and destinations)
    $Y_{ijk} = \theta_0 + b_{00j} + c_{00k} + d_{0jk} + e_{ijk}$
In [5]:
null_model <- gls( log(ridersum) ~ 1, data=wmata)
null_model_o <- lmer( log(ridersum) ~ 1+ (1 | mstn_id_o), data = wmata)
null_model_d <- lmer( log(ridersum) ~ 1+ (1 | mstn_id_d), data = wmata)
null_model_od <- lmer (log(ridersum) ~ 1 +(1| mstn_id_d) + (1| mstn_id_o), data = wmata)

Test the null models against one another

first, we test whether each of the models with random intercepts are better than the fixed intercept model

In [54]:
lrtest(null_model, null_model_o)
lrtest(null_model, null_model_d)
lrtest(null_model_o, null_model_d)
Warning message:
In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "gls", updated model is of class "lmerMod"
Out[54]:
#DfLogLikDfChisqPr(>Chisq)
12-13746.16NANANA
23-13311.891868.55146.737529e-191
Warning message:
In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "gls", updated model is of class "lmerMod"
Out[54]:
#DfLogLikDfChisqPr(>Chisq)
12-13746.16NANANA
23-11462.1314568.0710
Out[54]:
#DfLogLikDfChisqPr(>Chisq)
13-13311.89NANANA
23-11462.1303699.520

both models are significantly better than the fixed intercept model. Between the three models, the random intercept for destinations seems to explain the most variance.

So far, we know that there are station effects, and that destination effects are larger than origin effects

Test both random intercept models against the cross classified model

In [55]:
lrtest(null_model_o, null_model_od)
lrtest(null_model_d, null_model_od)
Out[55]:
#DfLogLikDfChisqPr(>Chisq)
13-13311.89NANANA
24-10221.116181.5680
Out[55]:
#DfLogLikDfChisqPr(>Chisq)
13-11462.13NANANA
24-10221.112482.0480
In [6]:
htmlreg(list(null_model_o,null_model_d, null_model_od), single.row =TRUE,type="html", custom.model.names = c("Random Origin", "Random Destination","Cross Classified"))
Out[6]:
'
Statistical models
Random Origin Random Destination Cross Classified
(Intercept) 5.08 (0.08)*** 5.07 (0.14)*** 5.05 (0.16)***
AIC 26629.78 22930.26 20450.21
BIC 26650.32 22950.80 20477.60
Log Likelihood -13311.89 -11462.13 -10221.10
Num. obs. 6961 6961 6961
Num. groups: mstn_id_o 84 84
Variance: mstn_id_o.(Intercept) 0.45 0.50
Variance: Residual 2.59 1.49 0.99
Num. groups: mstn_id_d 84 84
Variance: mstn_id_d.(Intercept) 1.58 1.62
***p < 0.001, **p < 0.01, *p < 0.05
'

likelihood ratio tests show that the cross classified model is significantly better than each of the random intercept models; it also has the smallest residual variance and AIC/BIC

so far so good

some diagnostic plots:

In [7]:
plot(null_model_od)
qqnorm(resid(null_model_od))
In [78]:
ggCaterpillar(ranef(null_model_od, condVar=TRUE))
Warning message:
: ‘mode(antialias)’ differs between new and previous
	 ==> NOT changing ‘antialias’
Warning message:
: ‘mode(antialias)’ differs between new and previous
	 ==> NOT changing ‘antialias’
Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
 but FUN(X[[1]]) result is length 0
Out[78]:
$mstn_id_d

$mstn_id_o

the residuals ok. there are some groups clustered at the bottom, i.e. certain Os and Ds that have low levels of ridership. Apart from that, they look random

qqplots show the residuals and random effects are approximately normal. The lines aren't perfect, but I don't think we need to worry about it yet

Build some models

the first model adds fixed effects for each OD pair (we keep the cross-classified specification)

In [9]:
model_1<- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log1p(travel_time) +
                  (1 | mstn_id_d)+ (1 | mstn_id_o),
                data = wmata)

htmlreg(model_1, type="html", single.row =TRUE,custom.model.names = "Model 1")
lrtest(null_model_od, model_1)
Out[9]:
'
Statistical models
Model 1
(Intercept) 9.71 (0.17)***
log(peak_fare_05) -1.62 (0.16)***
log(off_peak_fare_05) 2.28 (0.19)***
log(track_mile) 0.17 (0.12)
log(inv_comp_mile) -1.13 (0.09)***
log1p(travel_time) -2.28 (0.05)***
AIC 16441.27
BIC 16502.90
Log Likelihood -8211.64
Num. obs. 6961
Num. groups: mstn_id_d 84
Num. groups: mstn_id_o 84
Variance: mstn_id_d.(Intercept) 1.45
Variance: mstn_id_o.(Intercept) 0.54
Variance: Residual 0.55
***p < 0.001, **p < 0.01, *p < 0.05
'
Out[9]:
#DfLogLikDfChisqPr(>Chisq)
14-10221.1NANANA
29-8211.63654018.9370

This is already a much better model than the null_od... we cut the residual variance in half

Model 2 adds more fixed effects to match Hiro's specification. Note that we still haven't centered any of the variables so the intercept terms don't really make sense and can't be interpreted. This is ok for the moment since we're just exploring some specifications, but eventually we'll need to center variables like peak_fare and track_miles that don't have a true zero. That way we can interpret the random intercepts as the group effects for OD pairs having the average fare and average trip duration, etc

In [12]:
model_2 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
                   (1 | mstn_id_o) + (1 | mstn_id_d),
                 data = wmata)

htmlreg(list(model_1, model_2), type="html",single.row =TRUE, custom.model.names = c("Model 1","Model 2"))
lrtest(model_1, model_2)
Out[12]:
'
Statistical models
Model 1 Model 2
(Intercept) 9.71 (0.17)*** 1.53 (2.10)
log(peak_fare_05) -1.62 (0.16)*** -1.51 (0.15)***
log(off_peak_fare_05) 2.28 (0.19)*** 1.68 (0.19)***
log(track_mile) 0.17 (0.12) 0.12 (0.13)
log(inv_comp_mile) -1.13 (0.09)*** -0.84 (0.11)***
log1p(travel_time) -2.28 (0.05)*** -1.65 (0.06)***
log_tt_ratio2_fill 0.29 (0.03)***
tt_ratio201 -0.81 (0.09)***
log_parking_users 0.27 (0.01)***
factor(redline_OD)1 0.53 (0.04)***
factor(m25_1)1 0.45 (0.14)**
factor(m25_2)1 0.73 (0.29)*
logjobshalfUp_D 0.81 (0.12)***
log_tphpeakv2_D 0.85 (0.33)**
log_buslinescount_D 0.19 (0.15)
log_ParkingCapacityNew_O 0.12 (0.02)***
log_hh_o 0.26 (0.06)***
log_buslinescount_O 0.16 (0.13)
log(MedianHHIncome_O) -0.09 (0.16)
factor(GoodService_D)1 -0.05 (0.25)
AIC 16441.27 15755.51
BIC 16502.90 15913.02
Log Likelihood -8211.64 -7854.76
Num. obs. 6961 6961
Num. groups: mstn_id_d 84 84
Num. groups: mstn_id_o 84 84
Variance: mstn_id_d.(Intercept) 1.45 0.35
Variance: mstn_id_o.(Intercept) 0.54 0.34
Variance: Residual 0.55 0.50
***p < 0.001, **p < 0.01, *p < 0.05
'
Out[12]:
#DfLogLikDfChisqPr(>Chisq)
19-8211.636NANANA
223-7854.75614713.76062.978478e-143

Now we can try some random coefficients

model 3 allows the coefficient for origin households to vary randomly (by origin)

In [13]:
model_3 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
                   (1 + log_hh_o| mstn_id_o) + (1 | mstn_id_d),
                 data = wmata)

htmlreg(model_3, type="html", single.row =TRUE,custom.model.names = "Model 3")
lrtest(model_2, model_3)
Out[13]:
'
Statistical models
Model 3
(Intercept) 3.88 (1.93)*
log(peak_fare_05) -1.50 (0.15)***
log(off_peak_fare_05) 1.68 (0.19)***
log(track_mile) 0.12 (0.13)
log(inv_comp_mile) -0.84 (0.11)***
log_tt_ratio2_fill 0.28 (0.03)***
tt_ratio201 -0.81 (0.09)***
log_parking_users 0.27 (0.01)***
factor(redline_OD)1 0.53 (0.04)***
factor(m25_1)1 0.37 (0.13)**
factor(m25_2)1 0.56 (0.27)*
logjobshalfUp_D 0.81 (0.12)***
log_tphpeakv2_D 0.88 (0.33)**
log_buslinescount_D 0.21 (0.15)
log_ParkingCapacityNew_O 0.09 (0.02)***
log_hh_o 0.35 (0.08)***
log_buslinescount_O 0.23 (0.12)
log1p(travel_time) -1.65 (0.06)***
log(MedianHHIncome_O) -0.34 (0.14)*
factor(GoodService_D)1 -0.06 (0.25)
AIC 15753.69
BIC 15924.89
Log Likelihood -7851.85
Num. obs. 6961
Num. groups: mstn_id_o 84
Num. groups: mstn_id_d 84
Variance: mstn_id_o.(Intercept) 0.66
Variance: mstn_id_o.log_hh_o 0.09
Variance: mstn_id_d.(Intercept) 0.35
Variance: Residual 0.50
***p < 0.001, **p < 0.01, *p < 0.05
'
Out[13]:
#DfLogLikDfChisqPr(>Chisq)
123-7854.756NANANA
225-7851.84525.8202810.05446807

not a huge difference. lets try only a random coefficient for destination jobs

In [14]:
model_4 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
                   (1 | mstn_id_o) + (1 + logjobshalfUp_D | mstn_id_d),
                 data = wmata)
lrtest(model_2, model_4)
lrtest(model_3, model_4)
Out[14]:
#DfLogLikDfChisqPr(>Chisq)
123-7854.756NANANA
225-7845.053219.404676.11405e-05
Out[14]:
#DfLogLikDfChisqPr(>Chisq)
125-7851.845NANANA
225-7845.053013.584390

the random jobs coefficient outperforms the random households coefficient

now a model with both of the random coefficients we tried

In [15]:
model_5 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) +
                   (1 + log_hh_o | mstn_id_o) + (1 + logjobshalfUp_D | mstn_id_d),
                 data = wmata)
lrtest(model_2, model_5)
lrtest(model_4, model_5)
Out[15]:
#DfLogLikDfChisqPr(>Chisq)
123-7854.756NANANA
226-7841.505326.501827.487352e-06
Out[15]:
#DfLogLikDfChisqPr(>Chisq)
125-7845.053NANANA
226-7841.50517.0971450.007720684

More diagnostics

to make sure everything is still kosher

In [16]:
plot(model_5)
qqnorm(resid(model_5))
qqmath(ranef(model_5))
Warning message:
: ‘mode(antialias)’ differs between new and previous
	 ==> NOT changing ‘antialias’
Warning message:
: ‘mode(antialias)’ differs between new and previous
	 ==> NOT changing ‘antialias’
Warning message:
: ‘mode(antialias)’ differs between new and previous
	 ==> NOT changing ‘antialias’
Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
 but FUN(X[[1]]) result is length 0
Out[16]:
$mstn_id_o

$mstn_id_d
In [17]:
htmlreg(list(null_model_od, model_1, model_2, model_3,model_4, model_5),single.row =TRUE, caption = "Model Comparison", custom.model.names =c("Null", "Reduced", "Full", "Random HH", "Random Jobs", "Dual Random"))
Out[17]:
'
Model Comparison
Null Reduced Full Random HH Random Jobs Dual Random
(Intercept) 5.05 (0.16)*** 9.71 (0.17)*** 1.53 (2.10) 3.88 (1.93)* 1.80 (1.95) 3.82 (1.70)*
log(peak_fare_05) -1.62 (0.16)*** -1.51 (0.15)*** -1.50 (0.15)*** -1.55 (0.15)*** -1.54 (0.15)***
log(off_peak_fare_05) 2.28 (0.19)*** 1.68 (0.19)*** 1.68 (0.19)*** 1.72 (0.19)*** 1.72 (0.19)***
log(track_mile) 0.17 (0.12) 0.12 (0.13) 0.12 (0.13) 0.11 (0.13) 0.11 (0.13)
log(inv_comp_mile) -1.13 (0.09)*** -0.84 (0.11)*** -0.84 (0.11)*** -0.84 (0.11)*** -0.85 (0.11)***
log1p(travel_time) -2.28 (0.05)*** -1.65 (0.06)*** -1.65 (0.06)*** -1.64 (0.06)*** -1.64 (0.06)***
log_tt_ratio2_fill 0.29 (0.03)*** 0.28 (0.03)*** 0.29 (0.03)*** 0.29 (0.03)***
tt_ratio201 -0.81 (0.09)*** -0.81 (0.09)*** -0.83 (0.09)*** -0.83 (0.09)***
log_parking_users 0.27 (0.01)*** 0.27 (0.01)*** 0.27 (0.01)*** 0.27 (0.01)***
factor(redline_OD)1 0.53 (0.04)*** 0.53 (0.04)*** 0.52 (0.04)*** 0.52 (0.04)***
factor(m25_1)1 0.45 (0.14)** 0.37 (0.13)** 0.42 (0.11)*** 0.37 (0.11)***
factor(m25_2)1 0.73 (0.29)* 0.56 (0.27)* 0.65 (0.23)** 0.56 (0.22)*
logjobshalfUp_D 0.81 (0.12)*** 0.81 (0.12)*** 1.06 (0.11)*** 1.09 (0.10)***
log_tphpeakv2_D 0.85 (0.33)** 0.88 (0.33)** 0.64 (0.21)** 0.77 (0.16)***
log_buslinescount_D 0.19 (0.15) 0.21 (0.15) 0.02 (0.10) 0.03 (0.10)
log_ParkingCapacityNew_O 0.12 (0.02)*** 0.09 (0.02)*** 0.12 (0.02)*** 0.09 (0.02)***
log_hh_o 0.26 (0.06)*** 0.35 (0.08)*** 0.26 (0.06)*** 0.35 (0.08)***
log_buslinescount_O 0.16 (0.13) 0.23 (0.12) 0.18 (0.13) 0.23 (0.12)*
log(MedianHHIncome_O) -0.09 (0.16) -0.34 (0.14)* -0.09 (0.16) -0.34 (0.14)*
factor(GoodService_D)1 -0.05 (0.25) -0.06 (0.25) 0.12 (0.16)
AIC 20450.21 16441.27 15755.51 15753.69 15740.11 15735.01
BIC 20477.60 16502.90 15913.02 15924.89 15911.31 15913.06
Log Likelihood -10221.10 -8211.64 -7854.76 -7851.85 -7845.05 -7841.50
Num. obs. 6961 6961 6961 6961 6961 6961
Num. groups: mstn_id_d 84 84 84 84 84 84
Num. groups: mstn_id_o 84 84 84 84 84 84
Variance: mstn_id_d.(Intercept) 1.62 1.45 0.35 0.35 3.21 3.23
Variance: mstn_id_o.(Intercept) 0.50 0.54 0.34 0.66 0.34 0.66
Variance: Residual 0.99 0.55 0.50 0.50 0.50 0.50
Variance: mstn_id_o.log_hh_o 0.09 0.09
Variance: mstn_id_d.logjobshalfUp_D 0.12 0.13
***p < 0.001, **p < 0.01, *p < 0.05
'

So far, the results show that we definitely want to include a cross-classified specification, and that both random intercepts are significant. We get much better likelihoods and fit criteria moving from the null model to model 2.

In models 3, 4, and 5, we add random coefficients, and although the likelihoods get a little better, they only get marginally better--and we stop explaining additional variance, the residual is stable at .50. In fact the likelihood ratio test between models 4 and 5 is just barely insignificant. This means that allowing coefficient for destination jobs to vary adds information, but also allowing origin households to vary does not add much.

Once we get the fixed effects specified well, I think we'll be in really good shape

In [26]:
model_6 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) +
                   (1 | mstn_id_o) + (1 + factor(GoodService_D) + logjobshalfUp_D | mstn_id_d),
                 data = wmata)
lrtest(model_4, model_6)
lrtest(model_4, model_6)
Out[26]:
#DfLogLikDfChisqPr(>Chisq)
125-7845.053NANANA
228-7841.27837.5495620.05629857
Out[26]:
#DfLogLikDfChisqPr(>Chisq)
125-7845.053NANANA
228-7841.27837.5495620.05629857
In [39]:
model_7 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
                   (1 | mstn_id_o) + (1 | mstn_id_d),
                 data = wmata)
lrtest(model_4, model_7)
htmlreg(list(model_4, model_7), single.row=TRUE)
Out[39]:
#DfLogLikDfChisqPr(>Chisq)
125-7845.053NANANA
224-7854.747-119.386991.067314e-05
Out[39]:
'
Statistical models
Model 1 Model 2
(Intercept) 1.80 (1.95) 3.06 (2.28)
log(peak_fare_05) -1.55 (0.15)*** -1.49 (0.15)***
log(off_peak_fare_05) 1.72 (0.19)*** 1.67 (0.19)***
log(track_mile) 0.11 (0.13) 0.13 (0.13)
log(inv_comp_mile) -0.84 (0.11)*** -0.84 (0.11)***
log_tt_ratio2_fill 0.29 (0.03)*** 0.29 (0.03)***
tt_ratio201 -0.83 (0.09)*** -0.81 (0.09)***
log_parking_users 0.27 (0.01)*** 0.27 (0.01)***
factor(redline_OD)1 0.52 (0.04)*** 0.53 (0.04)***
factor(m25_1)1 0.42 (0.11)*** 0.43 (0.14)**
factor(m25_2)1 0.65 (0.23)** 0.69 (0.29)*
logjobshalfUp_D 1.06 (0.11)*** 0.74 (0.13)***
log_tphpeakv2_D 0.64 (0.21)** 0.52 (0.38)
log_buslinescount_D 0.02 (0.10) 0.22 (0.14)
log_ParkingCapacityNew_O 0.12 (0.02)*** 0.12 (0.02)***
log_hh_o 0.26 (0.06)*** 0.26 (0.06)***
log_buslinescount_O 0.18 (0.13) 0.17 (0.13)
log1p(travel_time) -1.64 (0.06)*** -1.66 (0.06)***
log(MedianHHIncome_O) -0.09 (0.16) -0.09 (0.16)
factor(GoodService_D)1 0.12 (0.16) 0.05 (0.25)
log(MilesfromCore_D) -0.16 (0.10)
AIC 15740.11 15757.49
BIC 15911.31 15921.85
Log Likelihood -7845.05 -7854.75
Num. obs. 6961 6961
Num. groups: mstn_id_o 84 84
Num. groups: mstn_id_d 84 84
Variance: mstn_id_o.(Intercept) 0.34 0.34
Variance: mstn_id_d.(Intercept) 3.21 0.34
Variance: mstn_id_d.logjobshalfUp_D 0.12
Variance: Residual 0.50 0.50
***p < 0.001, **p < 0.01, *p < 0.05
'
In [40]:
model_8 <- lmer( log(ridersum) ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
                   (1 | mstn_id_o) + (1 | mstn_id_d),
                 data = wmata)
lrtest(model_7, model_8)
htmlreg(list(model_7, model_8), single.row=TRUE)
Out[40]:
#DfLogLikDfChisqPr(>Chisq)
124-7854.747NANANA
224-7854.747001
Out[40]:
'
Statistical models
Model 1 Model 2
(Intercept) 3.06 (2.28) 3.06 (2.28)
log(peak_fare_05) -1.49 (0.15)*** -1.49 (0.15)***
log(off_peak_fare_05) 1.67 (0.19)*** 1.67 (0.19)***
log(track_mile) 0.13 (0.13) 0.13 (0.13)
log(inv_comp_mile) -0.84 (0.11)*** -0.84 (0.11)***
log_tt_ratio2_fill 0.29 (0.03)*** 0.29 (0.03)***
tt_ratio201 -0.81 (0.09)*** -0.81 (0.09)***
log_parking_users 0.27 (0.01)*** 0.27 (0.01)***
factor(redline_OD)1 0.53 (0.04)*** 0.53 (0.04)***
factor(m25_1)1 0.43 (0.14)** 0.43 (0.14)**
factor(m25_2)1 0.69 (0.29)* 0.69 (0.29)*
logjobshalfUp_D 0.74 (0.13)*** 0.74 (0.13)***
log_tphpeakv2_D 0.52 (0.38) 0.52 (0.38)
log_buslinescount_D 0.22 (0.14) 0.22 (0.14)
log_ParkingCapacityNew_O 0.12 (0.02)*** 0.12 (0.02)***
log_hh_o 0.26 (0.06)*** 0.26 (0.06)***
log_buslinescount_O 0.17 (0.13) 0.17 (0.13)
log1p(travel_time) -1.66 (0.06)*** -1.66 (0.06)***
log(MedianHHIncome_O) -0.09 (0.16) -0.09 (0.16)
factor(GoodService_D)1 0.05 (0.25) 0.05 (0.25)
log(MilesfromCore_D) -0.16 (0.10) -0.16 (0.10)
AIC 15757.49 15757.49
BIC 15921.85 15921.85
Log Likelihood -7854.75 -7854.75
Num. obs. 6961 6961
Num. groups: mstn_id_o 84 84
Num. groups: mstn_id_d 84 84
Variance: mstn_id_o.(Intercept) 0.34 0.34
Variance: mstn_id_d.(Intercept) 0.34 0.34
Variance: Residual 0.50 0.50
***p < 0.001, **p < 0.01, *p < 0.05
'
In [18]:
coefplot2(model_4)
In [22]:
residPlots(model_5)
Error in eval(expr, envir, enclos): could not find function "residPlots"
In [42]:
xyplot(residuals(model_4) ~ fitted(model_4),
  panel=function(x, y){ 
    panel.xyplot(x, y) 
    panel.loess(x, y, span = 0.75) 
    panel.lmline(x, y, lty = 2)  # Least squares broken line
  } 
)
In [34]:
glm_model_1 <- glmer(ridersum ~ log(peak_fare_05) + log(off_peak_fare_05) + log(track_mile) + log(inv_comp_mile) + log_tt_ratio2_fill + tt_ratio201 + log_parking_users + factor(redline_OD) + factor(m25_1) + factor(m25_2) + logjobshalfUp_D + log_tphpeakv2_D + log_buslinescount_D + log_ParkingCapacityNew_O + log_hh_o + log_buslinescount_O +
                   + log1p(travel_time) + log(MedianHHIncome_O) + factor(GoodService_D) + log(MilesfromCore_D) +
                   (1 | mstn_id_o) + (1 | mstn_id_d), data = wmata, family = poisson)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?
In [35]:
plot(glm_model_1)
In [36]:
lrtest(model_4, glm_model_1)
Warning message:
In modelUpdate(objects[[i - 1]], objects[[i]]): original model was of class "lmerMod", updated model is of class "glmerMod"
Out[36]:
#DfLogLikDfChisqPr(>Chisq)
125-7845.053NANANA
223-615949-212162080
In [37]:
htmlreg(list(model_4, glm_model_1), single.row=TRUE)
Out[37]:
'
Statistical models
Model 1 Model 2
(Intercept) 1.80 (1.95) -2.50 (1.21)*
log(peak_fare_05) -1.55 (0.15)*** -1.29 (0.01)***
log(off_peak_fare_05) 1.72 (0.19)*** 0.93 (0.01)***
log(track_mile) 0.11 (0.13) 0.48 (0.01)***
log(inv_comp_mile) -0.84 (0.11)*** -0.54 (0.01)***
log_tt_ratio2_fill 0.29 (0.03)*** 0.16 (0.00)***
tt_ratio201 -0.83 (0.09)*** -0.39 (0.01)***
log_parking_users 0.27 (0.01)*** 0.26 (0.00)***
factor(redline_OD)1 0.52 (0.04)*** 0.90 (0.00)***
factor(m25_1)1 0.42 (0.11)*** 0.46 (0.13)***
factor(m25_2)1 0.65 (0.23)** 0.89 (0.25)***
logjobshalfUp_D 1.06 (0.11)*** 0.64 (0.11)***
log_tphpeakv2_D 0.64 (0.21)** 0.39 (0.30)
log_buslinescount_D 0.02 (0.10) 0.28 (0.13)*
log_ParkingCapacityNew_O 0.12 (0.02)*** 0.07 (0.02)***
log_hh_o 0.26 (0.06)*** 0.20 (0.05)***
log_buslinescount_O 0.18 (0.13) 0.20 (0.11)
log1p(travel_time) -1.64 (0.06)*** -1.38 (0.00)***
log(MedianHHIncome_O) -0.09 (0.16) 0.46 (0.11)***
factor(GoodService_D)1 0.12 (0.16) -0.05 (0.21)
log(MilesfromCore_D) -0.25 (0.08)**
AIC 15740.11 1231943.98
BIC 15911.31 1232101.49
Log Likelihood -7845.05 -615948.99
Num. obs. 6961 6961
Num. groups: mstn_id_o 84 84
Num. groups: mstn_id_d 84 84
Variance: mstn_id_o.(Intercept) 0.34 0.26
Variance: mstn_id_d.(Intercept) 3.21 0.28
Variance: mstn_id_d.logjobshalfUp_D 0.12
Variance: Residual 0.50 1.00
***p < 0.001, **p < 0.01, *p < 0.05
'
In [ ]:
In [ ]:
In [ ]:
In [ ]: